In this notebook, we will
# Parameters
input_file = "input_adata.h5ad"
output_file = "adata.h5ad"
table_dir = "tables"markers = pd.read_csv(os.path.join(table_dir, "cell_type_markers.csv"))adata = sc.read_h5ad(input_file)random_state = 42
sc.pp.neighbors(adata, n_pcs=20, random_state=random_state)
sc.tl.umap(adata, random_state=random_state)
sc.tl.leiden(adata, resolution=2, random_state=random_state)computing neighbors
using 'X_pca' with n_pcs = 20
finished
computing UMAP
finished
running Leiden clustering
finished
fig, ax = plt.subplots(figsize=(14, 10))
sc.pl.umap(
adata, color="leiden", ax=ax, legend_loc="on data", size=20, legend_fontoutline=3
)for ct in cell_types:
marker_genes = markers.loc[markers["cell_type"] == ct, "gene_identifier"]
sc.pl.umap(
adata, color=marker_genes, title=["{}: {}".format(ct, g) for g in marker_genes]
)fig, ax = plt.subplots(figsize=(14, 10))
sc.pl.umap(
adata, legend_loc="on data", color="leiden", ax=ax, size=20, legend_fontoutline=3
)Assign clusters to cell types using the following mapping:
annotation = {
"B cell": [17, 4, 1, 28, 6, 7, 19, 8],
"CAF": [27],
"Endothelial cell": [21],
"Mast cell": [32],
"NK cell": [0, 18, 31, 26],
"T cell": [2, 9, 20, 14, 24, 3, 10, 16, 12, 11, 15, 30, 5, 13, 25],
"myeloid": [22],
"pDC": [33],
}sc.pl.umap(adata, color=["cell_type_unknown", "cell_type"])/opt/conda/lib/python3.8/site-packages/anndata/_core/anndata.py:1192: FutureWarning: is_categorical is deprecated and will be removed in a future version. Use is_categorical_dtype instead
... storing 'cell_type' as categorical
... storing 'cell_type_unknown' as categorical
display(
adata.obs.groupby("cell_type")[["samples"]].count().sort_values("samples"), n=50
)| samples | |
|---|---|
| cell_type | |
| pDC | 71 |
| Mast cell | 86 |
| CAF | 226 |
| myeloid | 523 |
| Endothelial cell | 534 |
| unknown | 662 |
| NK cell | 2820 |
| B cell | 9132 |
| T cell | 14242 |
<ggplot: (2935020998581)>
display(cell_type_fractions, n=50)| samples | facs_purity_cd3 | facs_purity_cd56 | frac_t_cell | frac_nk_cell | |
|---|---|---|---|---|---|
| 0 | H68 | 0.797 | 0.138 | 0.843984 | 0.129240 |
| 1 | H141 | 0.288 | 0.025 | 0.358058 | 0.024275 |
| 2 | H143 | 0.653 | 0.008 | 0.695455 | 0.002841 |
| 3 | H149 | 0.644 | 0.033 | 0.750663 | 0.053935 |
| 4 | H160 | 0.342 | 0.067 | 0.282983 | 0.035725 |
| 5 | H176 | 0.558 | 0.108 | 0.621974 | 0.125388 |
| 6 | H182 | 0.303 | 0.109 | 0.391644 | 0.105428 |
| 7 | H185 | 0.493 | 0.163 | 0.655858 | 0.195607 |
| 8 | H188 | 0.657 | 0.087 | 0.797382 | 0.073679 |
| 9 | H197 | 0.271 | 0.171 | 0.416143 | 0.212788 |
| 10 | H205 | 0.485 | 0.028 | 0.253820 | 0.040747 |
| 11 | H208 | 0.336 | 0.323 | 0.487741 | 0.295972 |
| 12 | H211 | 0.382 | 0.029 | 0.248805 | 0.026621 |
Text(0, 0.5, '%T cells')
Text(0, 0.5, '%NK cells')
adata.write(output_file, compression="lzf")